Skip to content

Add modules utils under mix_utils folder to avoid duplication in JSON… #36

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Mar 12, 2024

Conversation

fabrizio-turchi
Copy link
Contributor

Add the utils.py module under mix_utils folder to:

  1. avoid duplication in the JSON/CASE file generated by the parsers (UFED, AXIOM etc.)
  2. amends the data to either make it homogeneous or get rid of some dirty characters extracted from the XML reports.

The module is shared among all parsers (UFED, AXIOM, OXYGEN and XAMN/XRY)

@ajnelson-nist
Copy link
Member

Hi @fabrizio-turchi ,

If you are in progress on pushing changes, please leave this PR in Draft status until you are done and ready for it to be reviewed for merging.

Also, your first commit includes a compiled .pyc file. Can you please git rm that, and also include a patch updating .gitignore to ignore __pycache__ and *.pyc? I leave it up to you if you want to revise the history to purge that file.

@fabrizio-turchi
Copy link
Contributor Author

fabrizio-turchi commented Feb 20, 2024 via email

Copy link
Member

@ajnelson-nist ajnelson-nist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things:

  • .gitignore should prevent __pycache__ and *.pyc files from being committed. It might also be good to purge those files from the first patch, whether in a force-push on this PR or in a separate PR.
  • The adjust_json function looks like it could use further explanation because of in-memory data modification. I'd personally prefer a doctest be added and run as part of the module test suite if we confirm the behavior's necessary. (See these lines for a doctest demonstration.)

Copy link
Member

@ajnelson-nist ajnelson-nist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a pytest is needed. There might be runtime issues in some configurations of Python, e.g. Python 3.8.

@fabrizio-turchi
Copy link
Contributor Author

@ajnelson-nist I turned the class into a function at model level. The test_duplicate.py contains all the ways I invoke the code in the UFED parser. The JSON objects taken into account in the test are a simplified version of the ones included in the parser code but it doesn't affect the result of the test. Feel free to make any kind of comment for the benefit of the final result.

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
@ajnelson-nist
Copy link
Member

I realized a few things running pytest locally. The GitHub Action isn't running on this pull request; it seems we'd forgotten to turn on Actions for pull requests. I'm pushing that change to your main now.

Also, thank you again for using the typing symbols from Python <3.9. But, I'd forgotten that 3.8 is not supported by this package. The CASE validation action only operates in 3.9 or later, and the pyproject.toml file already specifies needing 3.9.

Thank you for the pytest. I was expecting it would operate on your new module as a package, but where you put the test made me realize your code isn't positioned to be a part of the cdo_mapping package. With where the code is, you and any end-users would only be able to use it by GIt submodule incorporation. The mix_utils directory would need to move to under /case_mapping/ to be exposed as part of the package.

The way to see this is to have the file test_duplicate.py move to under /tests, and then get its import line working again. My understanding of your intended use case is that your new module should be usable with an import like how /example.py has the line import case_mapping.

Last, be aware the node identifier style resolves to "@id": str(uuid.uuid4()), and without a @base statement in a graph-wide context dictionary, that attempted @id statement will not work as JSON-LD. (I forget if it will be silently dropped because the UUID is just a string, or if it will break in some other way, but case_validate would fail validation on the generated JSON because, at best, all of the nodes would be blank nodes.) I appreciate this test is designed to not think about the entire graph, but the @ids should at least read (after all the variables are resolved) "@id": "http://example.org/kb/" + str(uuid.uuid4()). kb: could also be fine as a prefix, as that's what's used almost everywhere across the CASE examples.

ajnelson-nist and others added 2 commits February 23, 2024 07:57
Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
@fabrizio-turchi
Copy link
Contributor Author

@ajnelson-nist I tried to use the example module to test the FacetUrlHistory and UrlHistoryEntry classes, but it raises the error: "PackageNotFoundError: case-mapping". Inside the init.py of the package there's a reference to case-mapping but the folder is case_mapping. Could you have a look to it?

@ajnelson-nist
Copy link
Member

@fabrizio-turchi : I have a guess why you're seeing that, but first, it would be helpful for review purposes if you would run make in your development environment. This will install pre-commit, by running this recipe. Then if you make any unimportant change to case_mapping/uco/observable.py---say, adding a new blank line at the end of the file---and attempt to commit the change, your code will be formatted as the Continuous Integration in this repository requires. You'll need to commit the changes pre-commit makes, but they should just be code formatting.

Back to your question - it sounds like the library is not installed in your testing environment. This is a reasonable expectation from tests being in separate directories from the package source.

If you run make, a virtual environment will be built (separate from the one pre-commit is installed into - the background on why is in this PR if you're curious), and the local case_mapping package will be installed into it. Activating that virtual environment will let you run pytest:

make
source venv/bin/activate  # Begin use of virtual environment
pytest
deactivate  # End use of virtual environment

make check in this project happens to include running pytest, in this recipe. (The commands pytest and poetry run pytest happen to do the same thing for me.)

If you follow the above steps, you'll find there was a missed str() cast in your last update, but you shouldn't have any trouble with packages.

@fabrizio-turchi
Copy link
Contributor Author

@ajnelson-nist I wouldn't like to bother you about the error I run into yesterday.
I changed the example.py adding the FacetUrlHistory and UrlHistoryEntry Observables, but I had to change the line version = importlib.metadata.version("case-mapping") with version = "xxx". In this way I could run the example to test the changes and later I restored the line.
The error remains the same inside the package: importlib.metadata.PackageNotFoundError: case-mapping

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
Reported-by: Keith Chason <kchason@users.noreply.github.com>
Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
@fabrizio-turchi
Copy link
Contributor Author

@ajnelson-nist I changed the code in the FacetUrlHistory class to fix the errors raised by case_validate. I didn't find a better way to fill in the uco-observable:urlHistoryEntry property. The code is borrowed by the _str_vars, _str_datetime ... methods of the FacetEntity class, but I couldn't use those methods. If you find a more elegant solution, do not hesitate to change teh code or suggest to me how to do it. 10x!

JSON-LD needs the value with the `@value` key to be a JSON string.  Some
undesired type-coercion could happen if ints or floats are used.

Python type signatures are added to check data flow consistency.

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
If `Tuple[str, ...]` is used, then each individual positional argument
must be a tuple of strings.  See PEP-484.

`Any` is used instead, to allow for `float`s in the accompanying unit
test, and `dict`s in future applications.

With these changes, `/mix_utils` passes `mypy`.

References:
* https://peps.python.org/pep-0484/#arbitrary-argument-lists-and-default-argument-values

Disclaimer:
Participation by NIST in the creation of the documentation of mentioned
software is not intended to imply a recommendation or endorsement by the
National Institute of Standards and Technology, nor is it intended to
imply that any specific software is necessarily the best available for
the purpose.

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
With these changes, `/mix_utils` passes `mypy --strict`.

Disclaimer:
Participation by NIST in the creation of the documentation of mentioned
software is not intended to imply a recommendation or endorsement by the
National Institute of Standards and Technology, nor is it intended to
imply that any specific software is necessarily the best available for
the purpose.

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
The migrated test script demonstrates the module import pattern for
consumers.

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
@ajnelson-nist
Copy link
Member

@fabrizio-turchi : I've moved some files around so you can meet your original goal of getting a shareable, importable library. /mix_utils/utils.py has moved to case_mapping/mix_utils.py, because otherwise you would not be able to import utils.py as a module. The test you'd added under /mix_utils has moved to /tests, to demonstrate that once case_mapping is installed in a Python environment, mix_utils.py is accessible by import.

I've also added Python type reviews on all of the files that were under /mix_utils, because I caught one subtle JSON-LD issue (a necessary string casting) and I wasn't quite following some of the data-type flow without a signature being written.

If you look at the git log --stat messages, you'll see why I added those changes.

I'm approving this PR. If you agree with the changes, please feel free to merge, and then I'll start reviewing #38 .

Note: I did not review changes pertaining to other Issues. I only checked that the unit testing (case_validate, pytest, and mypy --strict on the mix-utils files) passes. I still think it is fine to merge this PR so 38 can be returned to, and if you think other Issues are resolved with the extra changes in this PR, please feel free to close them as addressed.

@fabrizio-turchi fabrizio-turchi merged commit 4ad2ddb into casework:main Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants